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Background: Tissue protein expression profiling has the potential to detect new biomarkers to improve breast cancer (BC) 
diagnosis, staging, and prognostication. This study aimed to identify tissue proteins that differentiate breast cancer tissue from 
healthy breast tissue using protein chip mass spectrometry and to examine associations with conventional pathological features. 

Methods: To develop a training model, 82 BC and 82 adjacent unaffected tissue (AT) samples were analysed on cation-exchange 
protein chips by time-of-flight mass spectrometry. For validation, 89 independent BC and AT sample pairs were analysed. 

Results: From the protein peaks that were differentially expressed between BC and AT by univariate analysis, binary logistic 
regression yielded two peaks that together classified BC and AT with a ROC area under the curve of 0.92. Two proteins, ubiquitin 
and S100P (in a novel truncated form), were identified by liquid chromatography/tandem mass spectrometry and validated by 
immunoblotting and reactive-surface protein chip immunocapture. The combined marker panel was positively associated with 
high histologic grade, larger tumour size, lymphovascular invasion, ER and PR positivity, and HER2 overexpression, suggesting 
that it may be associated with a HER2-enriched molecular subtype of breast cancer. 

Conclusion: This independently validated protein panel may be valuable in the classification and prognostication of breast cancer 
patients. 



Breast cancer is the most frequently diagnosed cancer, and the 
leading cause of cancer death, in women worldwide (Jemal et al, 
2011), with the lifetime risk of developing breast cancer estimated 
to be 1 in 8 in Western countries (Feuer et al, 1993). Patient 
survival has increased steadily over recent decades, attributable in 
part to advances in both mammographic screening (Kopans, 2011) 
and adjuvant systemic treatment protocols (Peto et al, 2012). 
Whereas pathological features such as tumour size, node positivity, 
hormone receptor positivity, and human epidermal growth factor 
receptor 2 (HER2) overexpression have been used to guide 
clinicians' prescription of adjuvant therapy, true personalised 



medicine requires the development of better biomarkers of risk and 
response to therapy. 

Gene expression profiling is emerging as a tool for classifying 
breast cancers, guiding therapy, and predicting treatment 
responses (Cheang et al, 2008; Haas et al, 2011). However, genome 
and transcriptome analyses alone provide only a partial picture, as 
alternative splicing of mRNA, combined with more than 100 
unique post-translational protein modifications, mean that each 
gene may give rise to multiple protein species (Banks et al, 2000). 

Analysing the proteome may provide a more dynamic reflection 
of the impact of the cell's genetic programme on its immediate 
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environment (Aebersold et al, 2005). Cancer proteomics encom- 
passes the identification and quantitative analysis of differentially 
expressed proteins relative to healthy tissue counterparts at 
different stages of disease. Proteomic technologies can also be 
used to identify markers for cancer diagnosis, to monitor disease 
progression and efficacy of therapy, and to identify new therapeutic 
targets (Srinivas et al, 2001). 

Surface-enhanced laser desorption/ionisation time-of-flight 
(SELDI-TOF) mass spectrometry (MS) is a high- throughput 
proteomic method that involves solid-phase extraction of subsets 
of the proteome before analysis by TOF MS (Callesen et al, 2008). 
It has the ability to rapidly analyse hundreds of samples, essential 
for obtaining biologically and statistically relevant data in medical 
proteomic research. A recent review of protein profiling studies of 
breast cancer demonstrates that, despite a considerable diversity 
among these studies, there is a pattern of conformity developing, 
with increasing numbers of studies reporting similar peaks in 
protein profiles (Galvao et al, 2011). This suggests convergence to a 
set of common discriminatory peaks for breast cancer, with 
reproducibility across different clinical studies. 

In this study we have employed SELDI-TOF MS to discover 
tissue biomarkers of breast cancer, and validate them on an 
independent sample set. We have used two immunological 
methods to verify the identified proteins. Finally, the expression 
levels of these proteins have been associated with clinical 
pathological variables in order to explore their potential value in 
breast cancer classification and prognosis. 



MATERIALS AND METHODS 



Patient samples. The study involved 404 patient samples 
comprising 202 pairs: breast tumour tissue (BC) and adjacent 
unaffected breast tissue (AT) from each subject. For the discovery 
phase, 102 sample pairs were obtained from the Rolling Institute 
Breast Tumour Bank at Royal North Shore Hospital, Sydney, 
Australia. For independent validation, 100 sample pairs were 
provided by the Australian Breast Cancer Tissue Bank, Sydney, 
Australia. All breast tissue samples were collected at the day of 
surgery with prior informed consent, and the study was approved 
by the Human Research Ethics Committee of the Northern Sydney 
Central Coast Area Health Service, Sydney, Australia. At the time 
of surgical resection, tissues were immediately taken to a 
pathologist, who sampled both the tumour itself and adjacent 
tissue of normal appearance. Both samples were snap-frozen in 
liquid nitrogen within 20 min of resection and stored at — 80 °C. 
Oestrogen receptor (ER) and progesterone receptor (PR) were 
scored as either negative or positive by immunohistochemistry, 
using rabbit monoclonal SP1 (Biocare Medical, Concord, CA, 
USA) and mouse monoclonal Clone PgR636 (Dako, Carpinteria, 
CA, USA), respectively. The HER2 status was defined as positive or 
negative by immunohistochemistry using the HercepTest (Dako). 
Any equivocal result using this test was confirmed by FISH. 

Tissue preparation. Approximately 20 mg of each tissue 
sample (BC or AT) was prepared for proteomic analysis by 
grinding with a mortar and pestle while frozen in liquid 
nitrogen, and then solubilising in 10 volumes of lysis buffer 
(9.5 m urea, 2% 3-[(3-cholamidopropyl)dimethylammonio]-l-pro- 
panesulfonate (CHAPS), and 1% dithiothreitol). Lysates were 
added to a QiaShredder spin column (Qiagen, Hilden, Germany) 
and centrifuged (12 000 r.p.m., 5 min) to remove insoluble material. 
Samples were applied to weak cation -exchange (CM 10) protein 
chips (Bio-Rad Laboratories, Hercules, CA, USA) for immediate 
analysis as described below, or aliquotted and stored at — 80 °C for 
future analysis. The protein concentration of each extract was 



determined by BCA Protein Assay (Thermo Scientific, Rockford, 
IL, USA). 

Preparation of protein chips. The CM 10 protein chips were pre- 
equilibrated twice with 5 fi\ of binding buffer (50 mM sodium 
acetate, pH 6.0) for 5 min. Protein extracts were diluted 1 : 5 with 
binding buffer and 5 jA of each diluted extract was pipetted onto 
the chips. All samples were run in duplicate. Chips were then 
incubated with shaking using a MicroMix 5 (settings: form 20, 
amplitude 4; EURO/DPC Instrument Systems, Flanders, NJ, USA) 
for 90 min at room temperature. Each spot was treated with 
2 x 1 fA of 50% cyano-4-hydroxycinnamic acid in 50% acetonitrile 
containing 0.5% trifluoroacetic acid (TFA), and air dried. 

Generation of MS profiles. Protein profiles were initially obtained 
using a PBSIIc protein chip reader (Bio-Rad Laboratories, 
Hercules, CA, USA), and in the latter part of the study, a SELDI 
Enterprise Edition protein chip reader (Bio-Rad). Mass spectra 
were generated for each sample in the mass/charge (m/z) range of 
1000-30 000 with a laser intensity setting of 175 (arbitrary units). 
The laser was optimised for data collection between 1000 and 
15 000 m/z, with detector sensitivity set at 8. Peaks <1000 m/z 
were deflected away from the detector. Data were averaged from 
328 spectra evenly distributed across each spot. Mean values from 
duplicate spectra for each sample were used in all subsequent 
analyses. The m/z value for each peak was determined using 
external calibration with protein standards including bovine 
insulin (5734.51 Da), equine cytochrome c (12 362 Da), equine 
apomyoglobulin (16 952.3 Da), and bovine carbonic anhydrase 
(29 023.70 Da; Sigma-Aldrich, St Louis, MO, USA). After calibra- 
tion, spectra were baseline-subtracted and normalised using the 
total ion current between 1500 and 15 000 m/z. Spectra that 
required a normalisation factor of >2 were repeated, and if the 
high normalisation factor persisted, these data were discarded. 
Peak detection was initially performed using Biomarker Wizard 
Version 3.2.2 (Bio-Rad Laboratories) on all peaks with signal/noise 
ratio ^ 5 and present in at least 10% of all spectra. Subsequently, all 
MS spectra were exported to ProteinChip Data Manger v4.1 used 
with the ProteinChip SELDI System Enterprise Edition (Bio-Rad) 
to refine the combined data analysis. 

MS data analysis. Data analysis was designed in three stages. For 
initial discovery, biomarker panels were developed on the training 
data set using 102 BC and AT sample pairs. Cluster analysis was 
performed using Biomarker Wizard version 3.2.2 (Bio-Rad). 
Univariate analysis of individual peaks was performed by Mann- 
Whitney [/-test using SPSS (Version 18.0, SPSS Inc., Chicago, IL, 
USA). All protein peaks that significantly discriminated BC from 
AT at P< 0.001 were then subjected to multivariate analysis using 
forward and reverse binary logistic regression (SPSS) to develop 
the training model. The discriminatory power of each putative 
marker was further described using receiver operating character- 
istic (ROC) area-under-the-curve (AUC) analysis. To test protein 
panels that were best able to discriminate BC from AT, 10 -fold 
internal cross-validation was used as previously described 
(Ambroise and McLachlan, 2002; Scarlett et al, 2006). External 
validation was carried using an independent set of 100 paired BC 
and AT samples. 

After external validation, to consolidate and unify the initial 
discovery and validation data, a further analysis was performed on 
the combined data sets. This coincided with the acquisition of new 
peak cluster analysis software, ProteinChip Data Manager Version 
4.1 (Bio-Rad). Similar to the initial discovery phase, both univariate 
analysis using nonparametric statistics and multivariate analysis 
using binary logistic regression were applied, confirming a final 
two -protein marker panel and allowing calculation of overall 
estimates of sensitivity and specificity, accuracy, and ROC 
AUC values. The final stage of data analysis was to re- evaluate 
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the two -protein panel on the separate training and validation sets 
to ensure consistency between the findings from the new and 
original software packages. In this re- testing, all common peaks 
obtained from the combined data set study were used for each 
regression analysis to achieve classification of tumour samples 
separately in the training and validation sets. 

Protein identification. For purification of the putative biomar- 
kers, tissue lysates were fractionated using a cation- exchange resin 
(Mustang S, Pall Corp., Ann Arbor, MI, USA) with stepwise pH 
elution from pH 4 to pH 9 in a 96 -well filter plate format 
(AcroPrep, Pall) as previously described (Chung et al, 2009). 
Proteins of interest in the eluates were monitored by SELDI-TOF 
MS on normal-phase (NP20) chips. Fractions containing an 
~ 8.5 kDa putative biomarker were further purified using reverse- 
phase liquid chromatography (LC) on a 250 x 4.6 mm Jupiter 5 fim 
300- A C18 column (Phenomenex, Lane Cove, Australia), eluted 
with a 35-min linear gradient from 15% to 60% acetonitrile in 0.1% 
TFA at 1.5mlmin _1 , followed by separation on 12% SDS-PAGE 
detected with SYPRO ruby protein stain (Invitrogen, Eugene, OR, 
USA). Protein bands of interest were excised from the gel and 
analysed using both nanoLC-ESI-MS/MS and MALDI-TOF 
peptide mass fingerprinting by the Australian Proteome Analysis 
Facility (Macquarie University and University of New South 
Wales, Sydney, Australia). The protein peak at 9.2 kDa was purified 
and identified in a similar manner. 

Immunological validation of protein markers. To detect ubiqui- 
tin and SI OOP by western blotting, BC and AT tissue extracts were 
separated by 12% SDS-PAGE and transferred to PVDF mem- 
branes (Bio-Rad). Membranes were blocked for lh at room 
temperature with 5% skim milk. Ubiquitin was detected by 
incubating the transferred membranes for 2 h at room temperature 
on a shaking platform with anti-human ubiquitin monoclonal 
antibody (R&D Systems, Minneapolis, MN, USA) in a 1:500 
dilution in 5% skim milk. For SI OOP western blotting, samples 
were concentrated five-fold by centrifugal ultrafiltration with 
3-kDa MW cutoff (Nanosep 3K Omega, Pall Corp.) before 
electrophoresis. This was necessary to increase detection sensitiv- 
ity. Concentrated samples were separated and transferred, and 
membranes blocked, as described above, and SI OOP was detected 
by incubating overnight at 4 °C with rabbit anti-human antibody 
(Invitrogen) in a 1 : 500 dilution in 5% skim milk. Secondary 
antibody, peroxidase-linked anti-rabbit IgG (1:2000) was added 
for 1 h at room temperature and the protein bands were visualised 
by enhanced chemiluminescence using the SuperSignal West Pico 
Luminol/Enhancer solution (Thermo Scientific). Western blot data 
were imaged using the LAS 3000 imaging system (Fujifilm, 
Stamford, CT, USA) and the images were analysed with Multi- 
Gauge version 3.0 software (Fujifilm). The quantitative data were 
normalised to the loading control of /?-actin, and analysed using 
the Wilcoxon signed-rank test (SPSS). 

To confirm the identity of the m/z 8558 protein peak by protein 
chip immunocapture, pre-activated RSI 00 protein chips (Bio-Rad) 
were pre-coupled with 2 fig of monoclonal anti-human ubiquitin 
antibody (R&D) in 50 mM NaHC0 3 buffer (pH 9.2) at 4°C. The 
spots were washed with 50 BSA to block the remaining active 
sites. Tissue lysates were diluted 1 : 5 in buffer containing 50% 
human serum in 0.1% Triton X-100 in PBS, spotted onto RS100 
protein chips, and incubated for 2h at room temperature on a 
shaker to achieve optimal binding. After washing with PBS, all 
spots were rinsed by 50 mM Tris-HCl, 1 m urea, 0.1% CHAPS, and 
0.5 m NaCl, pH 7.2. After further washing in 5 mM HEPES, pH 7.2, 
the spots were coated with 2 x 1 jA of 50% sinapinic acid in 50% 
acetonitrile, 0.5% TFA, and air dried. The chips were then analysed 
on the SELDI-TOF MS. A His-tagged recombinant ubiquitin 
standard (10.6 kDa; R&D) was used as a control. The m/z 9226 
protein peak was similarly verified using RSI 00 protein chips to 



confirm its identity as SI OOP. Before protein chip preparation, all 
tissue extracts were pre-concentrated as described above for 
western blotting. The RSI 00 protein chips were pre-coupled with 
2 fig of rabbit anti-human SI OOP antibody (Invitrogen) in 50 mM 
NaHC0 3 buffer (pH 9.2) at 4 °C. The samples were then treated 
and analysed as described above. His-tagged recombinant SI OOP 
(12.6 kDa; Novus Biologicals, Littleton, CO, USA) was used as a 
control. 

Statistical analysis of clinical features. The association between 
levels of the two protein markers, individually and in combination, 
and tumour pathologic variables (tumour size, histological grade, 
lymphovascular invasion, lymph node involvement, ER and PR 
status, and HER2 expression) was examined using the Mann- 
Whitney U- test (SPSS). Subgroup analyses were also performed, in 
which lymph node-negative (n = S4) or lymph node-positive 
(n = 85) groups were analysed separately. Significance was set at 
P<0.05. 



RESULTS 



Patient characteristics. A total of 202 pairs of tissue samples were 
used in this study, generating 808 spectra, of which 684 (duplicate 
spectra on 171 pairs of samples) were subjected to full analysis. Of 
the 102 pairs of samples selected for the training stage, 82 pairs 
were fully analysed. Of the remaining 20 pairs, 8 were excluded 
on clinicopathologic grounds: 4 had DCIS, 2 had neoadjuvant 
treatment, and 2 had recurrent tumours; a further 12 sample pairs 
were excluded when their mass spectra did not meet normalisation 
criteria. For the validation set of 100 samples pairs, 89 pairs of the 
subjects were analysed. Seven sample pairs were excluded on 
clinicopathologic grounds: 4 had neo-adjuvant therapy, 1 had 
metastatic disease, and 2 had recurrent disease; 3 sample pairs were 
lost during preparation; and 1 pair was excluded when the mass 
spectra did not meet normalisation criteria. The median age for the 
patients included in the training and validation sets was 60 (range 
28-92) and 58 (range 27-85), respectively. The clinical pathologic 
characteristics of the tumours including histologic type and grade, 
size, presence of lymphovascular invasion (LVI), hormone receptor 
(ER and PR), HER2 status as well as lymph node status are 
presented in Table 1. 

Selection of protein biomarker panel by MS-based protein 
profiling. The training set sample pairs (BC and AT) were 
subjected to MS analysis in duplicate to identify putative protein 
biomarkers that could distinguish tumour from unaffected tissue. 
The 82 sample pairs whose spectra were amenable to normalisation 
yielded 328 spectra, from which 53 common peaks were 
determined by clustering analysis. Of these, 14 peaks (m/z 1337, 
1705, 1842, 2033, 3790, 3804, 8346, 8548, 8599, 9205, 9239, 9292, 
9641, and 12 220) were significantly differentially expressed 
(P< 0.005, Mann-Whitney test). These individual putative 
biomarkers had ROC AUC values ranging from 0.70 to 0.84. 
The 14 peaks were tested in forward and reverse binary logistic 
regression analysis with 10-fold cross-validation. This produced a 
final panel of 3 peaks (m/z 1842, 8599, and 9292) that classified BC 
and AT, with ROC AUC of 0.87, as shown in Figure 1A (curve Ti). 

Independent validation. The three putative biomarkers were 
tested using an independent validation set of 100 sample pairs, of 
which 89 pairs of spectra (in duplicate, 356 spectra) could be 
analysed after normalisation. For the validation set, 57 common 
protein peaks were determined by clustering analysis. Testing 
the three-protein panel derived from the training set on the 
independent sample set of 89 BC and 89 AT samples gave a 
ROC AUC of 0.91 (Figure 1A, curve Vi). The sensitivity and 
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Table 1 . Patient characteristics 



Characteristics 


Training 


Validation 


set 


set 


No. of patients 


82 


89 


Age (median) 


60 


58 


Histologic type 


Ductal (IDC) 


68 


76 


Lobular (ILC) 


10 


10 


Other 


4 


3 


Histologic grade 


Grade 1 


7 


11 


Grade 2 


32 


27 


Grade 3 


43 


49 


Missing 




2 


Tumour size 


< 2 cm 


29 


28 


^2cm 


53 


59 


Missing 




2 


Oestrogen receptor 


Positive 


56 


64 


Negative 


25 


23 


Missing 


1 


2 


Progesterone receptor 


Positive 


44 


54 


Negative 


38 


33 


Missing 


0 


2 


HER2 overexpression 


Positive 


15 


16 


Negative 


57 


63 


Missing 


10 


10 


Lymphovascular invasion 


Present 


34 


35 


Absent 


48 


54 


Lymph node involvement 


Positive 


42 


43 


Negative 


40 


44 


Missing 




2 


Abbreviations: HER2 = human epidermal growth factor receptor 2; IDC = invasive ductal 


carcinoma; ILC = invasive lobular carcinoma. 





specificity were 80.9% and 91%, respectively, and overall accuracy 
was 90%. 

Re-analysis of combined data sets. To increase the statistical 
power of the training and validation analyses and confirm the 
results using a newer software version, we combined the data sets 
into a single analysis of all 171 breast tissue sample pairs. Using 
new clustering analysis software, ProteinChip Data Manager 
Version 4, we found 28 peaks common to all spectra in the m/z 
range of 2500 to 15 000. Peaks of lower mass were excluded from 
this analysis because the putative marker at m/z 1842 had been 
determined by LC-MS/MS to be non-peptide in nature (data not 
shown). By univariate analysis (Mann- Whitney), the significant 
peaks (P< 0.001) were selected with the additional criterion that 
individual ROC AUC was at least 0.80, as summarised in Table 2. 



Multivariate analysis using binary logistic regression again 
confirmed the two protein markers at m/z 8558 and 9226. The 
difference in m/z values from those determined in the initial 
training set analysis (m/z 8599, 9292) is larger than expected and 
may be attributable to the fact that they are averaged from 684 
spectra (171 sample pairs in duplicate) rather than 328 spectra (82 
sample pairs in duplicate), re- calibration of standard curves 
between the initial and subsequent analyses, the use of different 
analysis software, and the relative mass inaccuracy of this 
technique. Both protein peaks were elevated in BC tissue relative 
to AT. The sensitivity and specificity for the binary classification 
using the combined 2-marker panel were 77.2% and 88.9%, 
respectively, with a ROC AUC value of 0.92 (Figure IB, curve C). 

Re-testing of initial training and validation sets. For final 
confirmation of the potential two-marker panel, it was re-tested on 
the original separate training and validation sets. The sensitivity 
and specificity of the classification for breast tissue biopsy samples 
were 73.2% and 87.8%, respectively, in the training set, compared 
with 80.9% and 91% in the validation set. Their corresponding 
ROC AUC values were 0.86 (curve Tr) and 0.91 (curve Vr) for the 
training and validation sets, respectively (Figure 1C). 

Together, these results suggest that two protein biomarkers in 
combination provide efficient discrimination between breast 
cancer tissue and healthy tissue. Figures 1D-F demonstrate the 
performance of the two protein peaks of m/z 8558 and 9226 alone 
and in combination. By paired sample £-test, a significant 
difference between BC and AT groups was found for each protein 
tested separately (Figures ID and E, h = 171, P< 0.001). For the 
two -protein combined panel, the mean value was 3. 3 -fold 
increased in BC compared with AT samples (Figure IF, /? = 171, 
P<0.001). 

Identification and verification of putative biomarkers. Both 
proteins of m/z 8558 and 9226, retained by weak cation -exchange 
protein chips, were significantly increased in breast cancer tissue. 
For identification, initial purification was carried out using cation- 
exchange followed by reversed-phase HPLC. Eluted fractions were 
pooled and fractionated by SDS-PAGE, and bands of ~8kDa 
were excised for final identification by LC-MS/MS. Ubiquitin was 
identified from 6 peptides (two overlapping), giving 72% sequence 
coverage. The calculated mass of monomeric ubiquitin (8560 Da) 
was in good agreement with the consensus mass obtained 
experimentally with SELDI (m/z 8558). Similarly, analysis of the 
marker of ~9.2kDa identified it as a fragment or variant of S100P 
(10 400 Da) from two peptides, giving 24% sequence coverage 
relative to full-length SI OOP (Supplementary Figure SI). Notably, 
the two peptides found in this study were identical to those 
previously used to identify SI OOP in a MALDI-MS study of 
proteins upregulated in colorectal cancer (Lam et at, 2010). 

Immunological verification of the two protein identities was 
performed using both western blotting and protein chip immuno- 
capture. For ubiquitin, western blot confirmed differential expres- 
sion of this protein between BC and AT tissue extracts. Figure 2A 
shows that for BC and AT samples from four randomly selected 
patients, relative overexpression of ubiquitin in the cancer samples 
was observed. When quantitated and analysed for eight randomly 
selected sample pairs, the increase in ubiquitin in BC was 
significant (Figure 2B, P = 0.017, Wilcoxon signed-rank test). 
The identity of this protein as ubiquitin was also verified by 
immunocapture on RSI 00 protein chips (Figure 2C). The m/z 8558 
peak, captured by immobilised ubiquitin antibody and displayed 
by SELDI-TOF MS, was increased in two BC samples in 
Figure 2Cii and iv compared with their corresponding AT samples 
in Figure 2Ci and iii, and absent when the capture antibody was 
nonimmune IgG (Figure 2Cvi). Figure 2Cv shows His- tagged 
recombinant ubiquitin (10.6 kD a) as a control. 
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Figure 1. Performance of two protein peaks individually and in combination. (A) The ROC area-under-curve (AUC) after cross-validation was 
0.87 (Ti) for the combination of peaks at m/z 1 842, 8599 and 9292. For the independent validation sample set, the average value of ROC AUC was 
0.91 (Vi). (B) Combination of the discovery and validation sets. The sensitivity and specificity of the combination peaks of m/z 8558 and 9226 
were 77.2% and 88.9% with a ROC AUC value of 0.92. (C) Retesting of initial training and validation sets. The ROC AUC values for these tests 
were 0.86 (Tr) and 0.91 (Vr) for training and validation sets, respectively. (D) Mean peak intensity values ± s.e.m. (normal vs cancer) for the marker 
at m/z 8558; (E) mean values ± s.e.m. for the marker at m/z 9226, and (F) mean values ± s.e.m. for the two markers combined. For the comparisons 
in (D-F), n = 171, P<0.001. 



Table 2. Summary of dat 


a analysis 










Data analysis 
Stage 


Training 


Validation 


Combination 


Retesting 
training set 


Retesting 
validation set 


No. of patients 


82 


89 


171 


82 


89 


MS profile no. 


164 


178 


342 


164 


178 


ROC AUC 


0.87 


0.91 


0.92 


0.86 


0.91 


Classification 


Sens 75.6% 
Spec 91.5% 


Sens 80.9% 
Spec 91% 


Sens 77.2% 
Spec 88.9% 


Sens 73.2% 
Spec 87.8% 


Sens 80.9% 
Spec 90.0% 


Abbreviations: AUC = area under the curve; MS = mass spectrometry; ROC = receiver operating characteristic; Sens = sensitivity; Spec = specificity. 



Similarly, the expression of SI OOP was also examined by western 
blot in eight random sets of BC and AT samples. Figure 2D shows 
the western blot data for four pairs, indicating variable levels of this 
protein between patients, with upregulation in BC samples. When 
quantitated and analysed for all eight sample pairs, the increase in 
immunoreactive SI OOP in BC was significant (Figure 2E, P = 0.012, 
Wilcoxon signed- rank test). By immunocapture using the same 
SI OOP antibody immobilised on RSI 00 protein chips, an 
apparently truncated form (m/z 9226) of SI OOP protein was 
observed, similar to that found in the discovery programme using 
CM 10 cation -exchange chips. This peak was more abundant in BC 
samples (Figure 2Fii and iv) than in the corresponding AT samples 
(Figure 2Fi and iii), and absent when the capture antibody was 
nonimmune IgG (Figure 2Fvi). Figure 2Fv shows His-tagged 
recombinant S100P (12.6 kDa) as a control. 

To further confirm the identity of the 9.22 kDa protein as a 
short form of SI OOP associated with breast cancer, we also isolated 
this protein from cell lysates prepared from MCF-7 breast cancer 
cells. As shown in Supplementary Figures S2A-C, this protein 
could be immunoprecipitated from MCF-7 lysates using three 
different SI OOP antibodies (rabbit monoclonal, mouse polyclonal, 
and rabbit polyclonal). Together with the SI OOP sequence data 
(Supplementary Figure SI), this unequivocally confirms its 



relationship to SI OOP. Also visible in the immunoprecipitates 
was a smaller peak of 10.48 kDa, presumably representing full- 
length S100P. The 9.22 kDa form could be separated from the full- 
length protein by further purification on reverse-phase HPLC 
(Supplementary Figure S2D). 

Association of two protein biomarkers and their combination 
with prognostic variables. To investigate the potential prognostic 
value of ubiquitin and SI OOP separately and in combination in 
breast cancer, we initially examined the association of each protein 
with variables including tumour stage, nodal stage, histologic type 
and grade, hormone receptor (ER and PR) and HER2 status, and 
LVI. As shown in Table 3, significant positive associations were 
seen between expression of the short form of SI OOP and tumour 
size, higher grade, LVI, lymph node involvement, hormone 
receptor positive status, and HER2 overexpression, whereas for 
ubiquitin a significant association was only seen with tumour size, 
grade, and HER2. When analysed together (Table 3), the combined 
panel was significantly associated with tumour histologic grade, 
size, and LVI, and also with ER-positive (ER + ) and PR-positive 
(PR + ) status and HER2 overexpression (Figure 3). 

As levels of the short form of SI OOP showed stronger 
associations than ubiquitin with each of the pathological indicators 
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Figure 2. Immunological validation of ubiquitin and S100P. (A) For ubiquitin, four BC and corresponding AT extracts were analysed by immuno- 
blotting, indicating relative upregulation of ubiquitin in some breast cancer patients. /?-Actin is shown as a loading control. (B) Densitometric 
analysis of ubiquitin western blots of eight sample pairs. Box plot shows median and upper and lower quartiles; lines show maximum and minimum 
values. P = 0.017, Wilcoxon signed-rank test. (C) Mass spectrometry (MS) spectra of proteins bound to immobilised mouse anti-ubiquitin antibody. 
Samples were (i) patient 1 normal tissue, (ii) patient 1 cancer tissue, (iii) patient 2 normal tissue, (iv) patient 2 cancer tissue, (v) recombinant 
His-tagged ubiquitin, and (vi) patient 2 cancer tissue, mouse IgG control. Arrow indicates the mass of monomeric ubiquitin, m/z 8558. N = normal 
tissue; C = cancer tissue. (D) For S1 OOP, four BC and corresponding AT extracts were analysed by immunoblotting, indicating relative upregulation 
of S100P in some breast cancer patients. /?-Actin is shown as a loading control. (E) Densitometric analysis of S100P Western blots of 8 sample 
pairs. Box plot shows median and upper and lower quartiles; lines show maximum and minimum values. P = 0.012, Wilcoxon signed-rank test. 
(F) Mass spectrometry spectra of proteins bound to immobilised rabbit anti-S100P antibody. Samples were (i) patient 3 normal tissue, (ii) patient 3 
cancer tissue, (iii) patient 4 normal tissue, (iv) patient 4 cancer tissue, (v) recombinant His-tagged S100P, and (vi) patient 4 cancer tissue, rabbit 
IgG control. Arrow indicates the mass of the S100P form of m/z 9226. N = normal tissue; Creancer tissue. 



(except for grade), and appeared to point to an ER/PR + , HER2- 
overexpressing phenotype (possibly corresponding to a 'HER2- 
enriched' molecular subtype; Reis-Filho and Pusztai, 2011), we 
undertook further analysis of its relationship to these prognostic 
features. When examined separately for ER — and ER + tumours, 
high SI OOP expression in both groups was equally associated with 
tumour size and the presence of LVI (not shown). However, 
the association between SI OOP and lymph node involvement was 
only significant for ER— tumours (P = 0.010). In contrast, the 
association between SI OOP and HER2 overexpression was only 
significant for ER+ tumours (P = 0.004), supporting the concept 
that a high SI OOP level might be associated with a hormone 
receptor-positive, HER2- enriched molecular subtype. 



When examined separately for lymph node-negative and lymph 
node-positive tumours, the positive association between ubiquitin, 
the short form of SI OOP, or the combined panel and LVI, ER + 
status, and PR+ status was entirely attributable to the lymph 
node-positive tumours. A significant relationship between the 
combined panel and HER2 overexpression was also confined to the 
lymph node-positive tumours (Supplementary Table SI). This 
subanalysis again points to a link between high expression 
of the short form of SI OOP in breast tumours, and an ER/PR + , 
HER2- over expressing phenotype that has been associated with 
markers of poor patient outcome without treatment. However, 
because sample numbers are low in some subanalyses, these 
interpretations should be regarded as preliminary. 
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Table 3. Association of two protein markers and their combination with tumour histopathologic variables 


Tumour variables 


P-value # ubiquitin 


P-value # S100P 


P-value # combined 


Tumour size (T^2cm, n = 57 vs T>2cm, n— 1 12) 


0.024 


0.009 


0.008 


Grade (G1, n = 18 vs G3, n = 92) 


0.026 


0.032 


0.016 


LVI (present, n = 69 vs absent, n = 1 02) 


0.106 


0.011 


0.044 


ER (positive, n = 120 vs negative, n = 48) 


0.059 


0.004 


0.016 


PR (positive, n = 98 vs negative, n = 71) 


0.067 


0.006 


0.022 


HER2 (positive, n = 31 vs negative, n = 120) 


0.033 


0.002 


0.009 


LN (positive, n = 85 vs negative, n = 84) 


0.315 


0.027 


0.121 


Histologic type (IDC, n = 144 vs ILC, n = 20) 


0.607 


0.765 


0.708 


Abbreviations: ER = oestrogen receptor; HER2 = human epidermal growth factor receptor 2; IDC = invasive ductal carcinoma; ILC=invasive lobular carcinoma; LN = lymph node; 
LVI = lymphovascular invasion; PR = progesterone receptor. 
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Figure 3. Association of the combined panel with histopathologic variables. Higher expression of the combined panel was significantly associated 
with higher histologic grade (P = 0.01 6) and higher tumour size (P= 0.008), and weakly associated with the presence of LVI (P = 0.044). The panel 
was also relatively increased in tumours that were positive for oestrogen receptors (P = 0.016), progesterone receptors {P= 0.022), and HER2 
overexpression (P= 0.009). Box plots show median and upper and lower quartiles; lines show maximum and minimum values. 



DISCUSSION 



We have used SELDI-TOF MS to discover two proteins that, in 
combination, show high discrimination between breast cancer and 
healthy breast tissue samples. A limitation of the protocol was that 
no microdissection was used, and hence tissue samples could have 
contained heterogeneous cell types. Despite this technical limita- 
tion, a robust panel of two putative breast cancer biomarkers was 
discovered, and verified on an independent sample set. After 
purification, the proteins were identified by LC-MS/MS as 
ubiquitin and a truncated form of the SlOO-family member, SI OOP. 

To discover tissue biomarkers in various cancers, SELDI-TOF 
MS has been used previously, although the majority of such studies 
in breast cancer have examined serum rather than tumour tissue. 
Included among proteins previously identified from breast tumour 
tissue lysates are albumin fragments (Gast et al, 2009) and 
complement C3a (Zhang et al, 2012), both presumably derived 
from the circulation. Tissue proteomic profiling using SELDI-TOF 
MS has also yielded peak clusters that can contribute to the 
classification of breast tumours into molecular subtypes (Brozkova 
et al, 2008; Goncalves et al, 2008) that resemble the luminal A and 
B, basal, and HER2-like subtypes defined by gene expression 
analysis (Reis-Filho and Pusztai, 2011). 

Of the two breast cancer- associated proteins identified in this 
study, ubiquitin is a small protein of 76 amino acids that is 
involved in both apoptotic signalling (Vucic et al, 2011) and 
transcriptional regulation (Hammond-Martel et al, 2011). 
Although monomeric ubiquitin has been identified in several 
previous biomarker studies in breast cancer, its exact relationship 
to disease status is unclear. In a SELDI-TOF MS study of breast 



cancer cell lines, we previously discovered ubiquitin as a strongly 
downregulated protein following treatment with chemotherapeutic 
drugs (Leong et al, 2007). Another SELDI analysis found the 
combination of a high ubiquitin level and low ferritin light chain 
level to be a positive prognostic marker in node-negative breast 
cancer (Ricolleau et al, 2006). In contrast, SELDI was also used to 
show that a protein of similar mass (not identified as ubiquitin) 
was a significant predictive factor for axillary lymph node 
metastasis (Nakagawa et al, 2006). In a MALDI MS analysis of 
microdissected cells from invasive breast cancer and healthy 
(reduction mammoplasty) tissue, ubiquitin was one of a cluster of 
proteins with increased expression in the cancer tissue (Sanders 
et al, 2008). 

Several E3 ubiquitin ligases are regarded as tumour suppressors 
in breast cancer and are either mutated or downregulated; in 
contrast, some others are regarded as oncogenes and are 
overexpressed (Chen et al, 2006). Among key downregulated or 
mutated E3 ligases are BRCA1 and Siahl, involved in DNA repair 
and transcriptional regulation, among other functions. The E3 
ligases downregulated in cancer are involved in both monoubi- 
quitination (Hahn et al, 2012) and polyubiquitination (Wen et al, 
2010), and low expression of the E3 ligase Siahl is associated with 
poorer disease-free survival in women with breast cancer 
(Confalonieri et al, 2009). It may be speculated that the increased 
level of monomeric ubiquitin that we observed associated 
with larger tumours, higher grade, and HER2 overexpression, but 
not with other pathological markers (Table 3), reflects a decrease in 
the activity of some key ubiquitin ligase complexes. Interestingly, 
a component of the Siahl ubiquitination complex, calcyclin- 
binding protein/Siahl -interacting protein (CacyBP/SIP), has 
increased expression in breast cancer tissue compared with 
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adjacent unaffected breast tissue, and is associated with 
markers of poor prognosis (Wang et al, 2010). CacyBP/SIP is a 
documented binding partner of SI OOP (Filipek et al, 2002), 
raising the possibility that the disruption of ubiquitination 
pathways in breast cancer might be involved in the increased 
levels of both of the cancer- related biomarkers discovered in 
our study. 

In contrast to the relatively weak associations observed 
between elevated ubiquitin levels and tumour size, higher grade, 
and HER2 overexpression, a high level of the novel short form of 
SI OOP was positively associated with larger tumours, higher grade, 
LVI, lymph node involvement, ER/PR positivity, and HER2 
overexpression. Of the two identified biomarkers, SI OOP made 
the stronger contribution towards the association of the combined 
panel towards each of these pathological features apart from 
tumour grade. As the association between SI OOP and HER2 
overexpression was exclusive to the ER+ tumours (P = 0.004), 
and absent in the ER— subgroup, a high tissue SI OOP level may 
point to a group of tumours with high ER/PR + status, HER2 
overexpression, and - given the association with size, grade, and 
LVI - relatively poor outcome, although our study did not include 
actual outcome variables. This corresponds most closely to the 
'HER-enriched' breast cancer subtype (Slamon et al, 1987; 
Reis-Filho and Pusztai, 2011), and suggests that S100P might have 
potential, both in the classification of breast cancer and possibly as 
a target for therapy. 

SI OOP is a member of the calcium-binding SI 00 protein family 
that contain a characteristic structural domain known as the 
EF hand motif (Marenholz et al, 2004). There are at least 24 
homologous SI 00 proteins with similar subcellular localisation, but 
differing in expression pattern and function (Marenholz et al, 
2004). The S100 proteins are low-molecular-weight (10-12 kDa) 
acidic proteins that exist as intracellular or secreted homo- or 
hetero-dimers with composition depending on the abundance of 
individual family members and the cellular context (Santamaria- 
Kisiel et al, 2006). Although the factors that regulate SI OOP have 
not been studied extensively, DNA microarray studies have 
included SI OOP among panels of genes upregulated by oestradiol 
(Terasaka et al, 2004), progesterone (Bray et al, 2005), and HER2 
overexpression (Mackay et al, 2003). These preliminary gene 
expression reports are consistent with the clinical associations we 
observed between high S100P levels and ER/PR + and HER2- 
over expressing tumours. 

Through its effects on tumour growth and metastasis, SI OOP has 
been associated with the progression of several types of cancer 
including pancreatic, prostate, colorectal, and breast (Lam et al, 
2010; Jiang et al, 2011). At least some of its effects have been shown 
to be mediated through extracellular interaction with RAGE 
(receptor for activated glycation end products) (Arumugam et al, 
2004). Several studies of pancreatic cancer-related molecular 
profiles have identified SI OOP as a significantly elevated gene 
(Crnogorac-Jurcevic et al, 2003; Logsdon et al, 2003) whose 
upregulation is an early event in the development of pancreatic 
cancer (Whiteman et al, 2007). In breast cancer, SI OOP was linked 
to immortalisation of breast epithelial cells in vitro and both 
tumour progression (Guerreiro Da Silva et al, 2000; Schor et al, 
2006) and early relapse (Barraclough et al, 2010) in patients. 
Survival of breast cancer patients with SI OOP-positive carcinomas 
was significantly worse than those negative for SI OOP (Wang et al, 
2006; Barraclough et al, 2009). SI OOP was also prominent 
among genes overexpressed in primary breast cancer cells 
from high-grade tumours (Dairkee et al, 2009). In contrast, 
gastric cancers that stain positive for SI OOP are associated with a 
better patient outcome than those that are negative for SI OOP 
(Jia et al, 2009). 

The SI OOP form detected in our study by MS on cation- 
exchange chips, and confirmed by MS after selective binding to 



immobilised SI OOP antibody, appeared at a m/z value of 9226. This 
contrasts with the expected size of mature SI OOP that contains 95 
amino acids and has a molecular mass of 10.4 kDa, suggesting that 
the observed SI OOP species detected by MS is a previously 
unreported truncated form of this protein. An amino-terminally 
truncated form of SI OOP, termed migration-inducing gene 9 
protein or MIG9, has been reported in GenBank (Protein 
Accession No. AAS00487.1), described as an alternatively spliced 
product. The predicted protein is identical to SI OOP [8-95] except 
for an isoleucine to methionine substitution at SI OOP residue 12 
(MIG9 residue 5), and has a predicted molecular mass of 9.64 kDa. 
If the true translation start site is methionine- 5, the predicted 
molecular mass would be 9.21 kDa and could explain our observed 
peak on SELDI-TOF MS. Importantly, it is unlikely that the many 
immunohistochemical studies that have measured SI OOP distribu- 
tion in patient tissues could distinguish between SI OOP and these 
truncated forms. Mass spectrometry would be the optimal method 
for this identification. We have therefore identified for the first 
time a novel isoform of SI OOP that is associated with pathologic 
markers in breast cancer. 

In conclusion, this study has discovered two protein 
biomarkers, ubiquitin and SI OOP - the latter as a novel truncated 
isoform - that, in combination, provide high discrimination 
between breast cancer tissue and healthy breast tissue. Correlation 
with clinical pathologic variables demonstrated that high 
values for the two-protein panel were associated with high 
histologic grade and tumour size, presence of LVI, ER- and PR- 
positive status, and HER2 overexpression. We propose that this 
independently validated protein biomarker panel may indicate a 
HER2-enriched breast cancer subtype with poor prognosis, 
and that measurement of SI OOP, in particular, may be valuable 
both in the classification of breast cancer and as a possible target 
for treatment. 
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